skip to main content


Search for: All records

Creators/Authors contains: "Ranaweera, Thilanka"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Natural language processing (NLP) techniques can enhance our ability to interpret plant science literature. Many state-of-the-art algorithms for NLP tasks require high-quality labelled data in the target domain, in which entities like genes and proteins, as well as the relationships between entities, are labelled according to a set of annotation guidelines. While there exist such datasets for other domains, these resources need development in the plant sciences. Here, we present the Plant ScIenCe KnowLedgE Graph (PICKLE) corpus, a collection of 250 plant science abstracts annotated with entities and relations, along with its annotation guidelines. The annotation guidelines were refined by iterative rounds of overlapping annotations, in which inter-annotator agreement was leveraged to improve the guidelines. To demonstrate PICKLE’s utility, we evaluated the performance of pretrained models from other domains and trained a new, PICKLE-based model for entity and relation extraction (RE). The PICKLE-trained models exhibit the second-highest in-domain entity performance of all models evaluated, as well as a RE performance that is on par with other models. Additionally, we found that computer science-domain models outperformed models trained on a biomedical corpus (GENIA) in entity extraction, which was unexpected given the intuition that biomedical literature is more similar to PICKLE than computer science. Upon further exploration, we established that the inclusion of new types on which the models were not trained substantially impacts performance. The PICKLE corpus is, therefore, an important contribution to training resources for entity and RE in the plant sciences.

     
    more » « less
  2. Switchgrass low-land ecotypes have significantly higher biomass but lower cold tolerance compared to up-land ecotypes. Understanding the molecular mechanisms underlying cold response, including the ones at transcriptional level, can contribute to improving tolerance of high-yield switchgrass under chilling and freezing environmental conditions. Here, by analyzing an existing switchgrass transcriptome dataset, the temporal cis- regulatory basis of switchgrass transcriptional response to cold is dissected computationally. We found that the number of cold-responsive genes and enriched Gene Ontology terms increased as duration of cold treatment increased from 30 min to 24 hours, suggesting an amplified response/cascading effect in cold-responsive gene expression. To identify genomic sequences likely important for regulating cold response, machine learning models predictive of cold response were established using k -mer sequences enriched in the genic and flanking regions of cold-responsive genes but not non-responsive genes. These k -mers, referred to as putative cis -regulatory elements (pCREs) are likely regulatory sequences of cold response in switchgrass. There are in total 655 pCREs where 54 are important in all cold treatment time points. Consistent with this, eight of 35 known cold-responsive CREs were similar to top-ranked pCREs in the models and only these eight were important for predicting temporal cold response. More importantly, most of the top-ranked pCREs were novel sequences in cold regulation. Our findings suggest additional sequence elements important for cold-responsive regulation previously not known that warrant further studies. 
    more » « less
  3. Premise

    Leaf morphology is dynamic, continuously deforming during leaf expansion and among leaves within a shoot. Here, we measured the leaf morphology of more than 200 grapevines (Vitisspp.) over four years and modeled changes in leaf shape along the shoot to determine whether a composite leaf shape comprising all the leaves from a single shoot can better capture the variation and predict species identity compared with individual leaves.

    Methods

    Using homologous universal landmarks found in grapevine leaves, we modeled various morphological features as polynomial functions of leaf nodes. The resulting functions were used to reconstruct modeled leaf shapes across the shoots, generating composite leaves that comprehensively capture the spectrum of leaf morphologies present.

    Results

    We found that composite leaves are better predictors of species identity than individual leaves from the same plant. We were able to use composite leaves to predict the species identity of previously unassigned grapevines, which were verified with genotyping.

    Discussion

    Observations of individual leaf shape fail to capture the true diversity between species. Composite leaf shape—an assemblage of modeled leaf snapshots across the shoot—is a better representation of the dynamic and essential shapes of leaves, in addition to serving as a better predictor of species identity than individual leaves.

     
    more » « less